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Over the last four years, we have developed the 
Meta toolkit for controlling distributed applica- 
tions. This toolkit has been publically available 
as part of the academic ISIS release, and has 
been used both within and outside of Cornell for 
building various system monitoring and control 
applications [5, 3, 4]. 

One major stumbling block with using Meta 
has been the language (called NPL) it supports. 
NPL is very low-level and using it is difficult, 
in the same way it is difficult to write machine 
language programs or raw Postscript programs. 
Hence, we have spent the last six months build- 
ing a higher-level language and runtime environ- 
ment. Our hope is that with this higher-level 
approach, we will be able to write more compli- 
cated Meta applications and thereby concentrate 
more on the use (and limitations) of Meta as an 
architecture. 

This note proceeds as follows. In Section 1, we 
review the Meta toolkit and its intended use. In 
Section 2 we describe our goals with Lomita and 
give an overview of its architecture and language 
syntax. In Section 3 we give a detailed example 
of the use of Lomita by presenting a complete 
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program for a load- adapt able service. 

1 Review of Meta 

A reactive system architecture partitions the sys- 
tem into two components: an active environment 
and an input -driven control program . The con- 
trol program monitors the state of the environ- 
ment through a sensor abstraction, and when the 
state meets some condition then it alters the en- 
vironment’s state through an actuator abstrac- 
tion. Process control systems naturally have a 
reactive architecture, as does system and net- 
work monitoring, software tool integration, de- 
bugging, and automatic system management. 

The Meta toolkit assists in the construction of 
distributed and reliable (albeit non-real-time) re- 
active systems. With Meta, one can instrument 
a program with software sensors and actuators 
in order to expose its state for control. Then, a 
control program can be written to monitor and 
control the instrumented programs. The Meta 
architecture interprets the control program in a 
distributed manner in order to supply both lower 
latency and tolerance to partial failures of the 
environment. Furthermore, the monitoring and 
control is done in a way to guarantee that the 
observed global state is consistent and changed 
atomically with respect to the monitoring of the 
control program. 

For example, consider a simple computation 
server that accepts jobs and executes them in 
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the order received (the pending job requests are 
kept in a queue). The load of a server is the 
estimated time needed to complete all submitted 
jobs. As well as being submitted, a job can be 
cancelled and the server can be stopped (losing 
all submitted jobs). 

This server can be instrumented with a sen- 
sor that gives the load of the server and a sensor 
that gives the queue of submitted jobs. It can 
also be instrumented with two actuators: one 
that cancels a job and one that stops the server. 
Then, Meta can be used to construct a service 
out of servers — for example, an actuator can be 
defined that submits a job to the lightest-loaded 
server of a group of servers, and a sensor can be 
defined that gives the average load of a set of 
servers. And, a control program can be written 
that creates additional servers on lightly-loaded 
machines when the average load is too high. Sec- 
tion 3 develops this example more fully. 

There are two steps to managing a distributed 
application with Meta: instrumenting the appli- 
cation and writing the control program. Instru- 
mentation is the more straightforward task. A 
Meta sensor or actuator is simply a procedure 
that is added to the application, where a sensor 
has no side-effects and an actuator changes the 
state of the application and returns success or 
failure. These procedures are registered with 
Meta using a library routine, which also asso- 
ciates a name and a type signature with the sen- 
sor or actuator. Finally, Meta has a set of li- 
brary routines that synchronizes the sampling of 
sensors and invocation of actuators with its own 
operation in order to guarantee that Meta sees 
and alters on only locally consistent states. 

An instrumented program is an example of a 
Meta context , that is, a named set of sensors and 
actuators. In Meta, each context belongs to a 
single context class that defines the types of its 
sensors and actuators. For example, if we as- 
sume that only one computation server will be 
run on any given machine, then the load sen- 
sor of a computation server running on a ma- 
chine grimnir could be named serv(grimnir).load, 
where the context is named serv(grimnir) and is 
of a context class named serv. 


Instrumented programs define what is called 
in Meta base contexts . Base contexts can be 
grouped into group contexts *. Put another way, 
a group context class can be defined as a col- 
lection of contexts from the same base context 
class, and a base context can join and leave any 
number of group contexts of a compatible class. 
Each sensor of the base context class exists in 
the group context class except that the type of 
the sensor is promoted to a set value. For ex- 
ample, assume that service( 1) is a group context 
class comprised of serv contexts. If the load sen- 
sor in the context class serv has an integer type, 
then service context class also has a load sensor 
but its type is set of integers. The value of this 
sensor in some group context is the set of load 
sensor values, one for each base context that is a 
member of the group context. 

Similarly, the actuation of service(l).stop will 
actuate serv(x).stop for every serv(x) that is a 
member of service(l), and the value of the actua- 
tion is success if all base actuations succeed; else 
the value is failure. Actuators in group context 
classes can also take two additional parameters: 
a positive integer and a set of values obtained 
from a sensor of the group. The first parameter 
specifies a number k of base contexts and the sec- 
ond parameter specifies a preference ranking r of 
the base contexts indicated by the source of the 
individual value. The actuation will invoke the 
actuator on the first k contexts denoted by r. For 
each that returns failure, an additional context 
is chosen from r. The group actuation will return 
success if k base contexts return success. For 
example, service(l).shutdown(2, sort(load)) will 
shut down the two lightest-loaded servers that 
are members of service(l). 

Control programs are written in a simple pro- 
gramming language called NPL. An NPL com- 
mand is equivalent to an atomic guarded com- 
mand (<f> i -► ai||...||0 m -+ a m ), where each 
<f>i is a predicate expression over sensor values 
and each is a sequence of actuator invocations 


1 A group context is called an aggregate in Meta. Meta 
is somewhat confusing in terms of contexts and context 
classes, however, and so we use the (hopefully clearer) 
Lomita terminology here. 
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whose parameters can be expressions of sensor 
values. The meaning of such a command is that 
it blocks until some fa is true, at which point the 
corresponding a,' executes, and any effects of a,' 
are not visible to other guarded commands until 
cti terminates. Such commands can be one-shot 
(once an executes the command terminates) 
or iterative (once an a, executes the command 
resumes waiting for a predicate to become true). 
Meta also guarantees that an NP L command ob- 
serves a valid sequence of global states. That is, 
not only is each global state used to evaluate a 
fa a valid global state [1], but the sequence of 
states is also consistent with the actual run of 
the environment [2]. 

Each context has associated with it an inter- 
preter of NPL commands. For base contexts, 
the interpreter resides in the same address space 
as the instrumented program. An NPL com- 
mand can be run in any interpreter (that is, an 
NPL program using fully-qualified names can be 
submitted to any context without changing its 
meaning), although the latency due to network 
communication is large — a command may run up 
to 500 times slower in a remote context than in 
a local context. Of course, some programs re- 
fer to more than one context and so must refer 
to some remote sensor or actuator no matter in 
which context they are run. 

Interpreters for group contexts are created by 
informing an interpreter that it should also im- 
plement the group context. For example, the 
interpreter for serv(grimnir) can be told to also 
implement the service(l) context. In addition, 
more than one interpreter can be so informed, 
in which case they run in a replicated mode — 
even though an interpreter fails, the context will 
remain accessible and the NPL commands it is 
running will continue to run. 

2 Lomita 

Although Meta is a powerful system, it is ex- 
tremely awkward to use. The NPL programs one 
writes for even simple control programs are very 
hard to read and to validate their correctness. 
Our goal with Lomita is to provide enough syn- 


tax and supporting semantics in order to make 
Meta usable. 

The central idea of Lomita is to fully im- 
plement the context class abstraction. Rather 
than submitting NPL programs to contexts, one 
writes a description of the context classes which 
includes a set of atomic commands (in a syn- 
tax much more readable than NPL). The Lomita 
runtime system then ensures that contexts are 
initialized and recovered with the appropriate 
NPL commands. 

Lomita consists of two parts. First, there is 
a compiler that takes Lomita programs and pro- 
duces an object file. Second, there is a replicated 
fault-tolerant service called the Lomita runtime 
that, when given a Lomita object file, loads the 
file into an internal database. The runtime moni- 
tors the currently active contexts and downloads 
the relevant NPL commands from its internal 
database when necessary. The runtime also cre- 
ates interpreters for group contexts when they 
are needed. 

A Lomita program consists of a set of context 
class definitions. Each context class definition 
specifies the attributes of the context class and 
lists the rules to be run in each context of that 
context class. Attributes can either be Meta sen- 
sors or actuators, they can be functions or they 
can be the Lomita key construct. 

The example in Section 3 gives several context 
class definitions. For example, the definition of 
the machine context declares that there is an in- 
strumented program that supplies sensors on the 
load of the machine and on who is logged in, and 
extends this context class with some additional 
sensors, such as when the machine is to be con- 
sidered “busy”. The definition also contains a 
single rule that initializes a value by invoking 
the “stop-server” actuator. 

There are three different kinds of context 
classes that can be declared in a Lomita pro- 
gram: the global context class, base context 

classes, and group context classes. Each con- 
text class defines a set of attributes and rules 
that apply to all contexts of that class. Base 
context classes and group context classes corre- 
spond with their equivalent in Meta. The global 
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context class contains a single context, called the 
global context. The attributes defined in the 
global context are available in all contexts. For 
example, every context has its own print actua- 
tor, and so print is defined as an actuator of the 
global context. 

Lomita rules has the following syntax: 

if/when predicate expression 

do sequence of actuator invocations end 
[ else if/when predicate expression 

do sequence of actuator invocations end]* 


f ree_raachines : machine group 
attributes 

key gp: string 
end 

with type 

select when ! busy 
remove when busy 
if timer (10000) 
do print (name, 

" has been free for 10 seconds.") 
end 

end 


By default, a Lomita rule is translated into 
an iterative guarded command, but a program- 
mer can stop iteration by using the exit actu- 
ator. The difference between “if” and “when” 
corresponds to whether the action is enabled in 
any state satisfying the predicate expression or 
only in a state in which the predicate becomes 
true. For example, the Lomita rule 

when "marzullo" in login 
do print ("watch out!") end 

prints the message “watch out!” once after each 
time “marzullo” logs in, while the rule 

if "marzullo" in login 

do print("watch out!") end 

continuously prints the message “watch out!” as 
long as “marzullo” is logged in. 

Group context classes can also specify rules 
that are to be run in the base context of all mem- 
bers of a group. Such rules are specified by a 
with statement, which has the following syntax: 

with expression/all 

[select when predicate expression / 
remove when predicate expression / 
rule ]* end 

The expression following the with keyword is 
called the key expression and when evaluated in 
the base context, yields the value of the key as- 
sociated with the group context. A select state- 
ment generates a rule for joining the group and 
remove generates a rule for leaving the group. 
For example, consider the following definition of 
a group of machines: 


The key for the group is the value of the type 
sensor, which yields the type of instrumented 
machine. Hence, this context class partitions 
machines into group contexts all containing the 
same type of machine. The rule in the with 
statement is run in each machine context that 
is a member of a free_machines context — in this 
case, a free machine will print every ten seconds 
that it is a free machine. 

3 Example 

The following is a complete Lomita 1.0 program. 
The program sew services a simple request for 
computation (the computation is given a name 
and an estimated amount of time). An instru- 
mented server is a member of the context class 
serVy and the context is named by the machine it 
runs on (e.g., serv(ydalir)). Servers are grouped 
into two groups — the group of all servers, and the 
group of servers that are not overloaded (called 
free.sewers). Furthermore, the new actuator 
addl defined in free.sewers submits a job to the 
lightest-loaded free server. 

A set of rules, associated with the group of all 
servers, governs the number of server replicas. 
These rules specify that the number of replicas 
must be between min.rep and max.rep. Further- 
more, if the average load of the servers is too 
high, then a new server is created, and if the av- 
erage load of the servers is too low and there is a 
server with no jobs, then that server is deleted. 

#define high__load 5.0 
#define max_users 2 
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♦define dally 30 
♦define max.load 30 
♦define min.load 2 
♦define max.rep 5 
♦define min.rep 1 

♦define serv.cmd M /usr/meta/utils/serv t” 

♦define has.server geti 

♦define set .has. server set(l, TRUE) 

♦define set.no.server set(l f FALSE) 

♦define wait.new.size getl 
♦define set.new.size set(l, TRUE) 

♦define reset.new.size set(l, FALSE) 

♦define last.nservers get2 

♦define set.nservers set(2, num.servers) 

global attributes 
sensor getl: boolean 
sensor get2: integer 
function avg (any): any 
function sort ({any}): {any} 
function timer (integer) : boolean 
function select. eq.int ( 

{integer}, integer): {integer} 
actuator exit 

actuator set (integer, any) 
actuator print (any) 
actuator shell (any) 
end 

machine: base 
attributes 

key name: string 
sensor load: real 
sensor alive: boolean 

sensor busy: boolean := load > high. load 

I I size(login) > max.users 

I I has. server 
sensor login: {string} 
actuator exec (cmd: string) 
actuator staxt.server: = 

exec (serv.cmd) ; 

set .has. server ; 

leave ( M freemachines M ) 
actuator stop.server := set.no.server 
end 

if true do stop.server; exit end 
end 

serv: base 
attributes 

key name: string 
sensor load: integer 


sensor alive: boolean 
sensor queue: {string} 
sensor overload: boolean := 
load > max.load 
actuator add ( 

job.name: string, job.tirae: string) 
actuator remove (job.name: string) 
actuator shutdown 
actuator stop:= shutdown; 

machine (name) . stop.server 
end 
end 

/* all machines that aren't busy */ 
freemachines : machine group 
attributes 

key not.needed 

sensor mean.load: real:= avg(load) 
sensor num.freemachines := size (alive) 
actuator start. server ( 

number: integer, pref: any) 
end 

with all 

select when ! busy 
remove when busy 
if timer(dally*1000) 
do print ( 
name, 

" has been free for ”, 
dally, " seconds.") end 

end 

end 

/* all servers that aren't overloaded ♦/ 

/* actuator addl submits job to lightest ♦/ 
/* loaded server. */ 
freeservers: serv group 
attributes 

key not.needed 

sensor num.f reeservers : = size(alive) 
actuator add ( 

number: integer, pref: {integer}, 
job.name: string, job.time: string) 
actuator addl ( 

jname: string, jtime: string) := 
add (1, sort(load), jname, jtime); 
end 

with all 

select when loverload 
remove when overload 
end 

end 


5 



/* All servers. Create a server if the */ 

/* average load is too high, and destroy */ 
/* am idle server if the average load is */ 
/+ too low. */ 
servers: serv group 
attributes 

key not .needed 

sensor num.servers := size(alive) 
actuator add ( 

number: integer, pref: {integer}, 
job.name: string, job.time: string) 
actuator stop ( 

number: integer, pref: {integer}) 
end 

with all select all end 

if true do set.new.size; exit end 
when num.servers <> last.nservers 
do set.nservers; set.new.size end 

if wait.new.size 

kk (f reemachines . num.f reemachines > 0) 
kk (num.servers *= BOTTOM 

I I num.servers < min.rep 
| | (avg(load) > max.load 

kk num.servers < max.rep)) 
do f reemachines . start_server( 

1, sort(load)); 
reset.new.size end 

if wait.new.size 

kk (num.servers > max.rep 
I | (avg(load) < min.load 

kk num.servers > min.rep)) 
do stop(l, select. eq.int(load, 0)); 
reset.new.size end 

end 
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